Contents

1 Quick start

Here we demonstrate a basic workflow starting from either a matrix output of the ‘computeMatrix’ function of deepTools or a soGGi ChIPprofile object, and ending with a heatmap or a boxplot to visualize the quantified signal across samples. Later in this document we will go deeper into the profileplyr object used here and the functions that can be used on this object.

In the code below, a deepTools ‘computeMatrix’ output is imported as a profileplyr object. The following actions are then performed on this object: 1) subset to include only the hindbrain and liver samples, 2) annotate with genes using GREAT, 3) group by range overlap with these gene annotations based on expression in hindbrain or liver, and 4) output as a deepTools matrix:

proplyrObject <- import_deepToolsMat(path_to_matrix)
proplyrObject <- proplyrObject[, , c(1:2)]
proplyrObject <- annotateRanges_great(proplyrObject, species = "mm10") 
proplyrObject <- groupBy(proplyrObject, group = gene_list_dataframe)
export_deepToolsMat(proplyrObject, con = "./deepTools_export")

The pipe function (%>%) from the magrittr package can also be used to simplify this same code:

import_deepToolsMat(path_to_matrix) %>%
  .[ , , c(1:2)] %>%
  annotateRanges_great(species = "mm10") %>%
  groupBy(group = gene_list_dataframe) %>% 
export_deepToolsMat(con = "./deepTools_export")

The output from either of the previous two code chunks can then be used to produce the following figure using the deepTools ‘plotHeatmap’ function.

Code from command line:

plotHeatmap -m deepTools_export.gz -o deepTools_export.jpg –startLabel start –endLabel end –xAxisLabel “distance (bp)”

The same workflow can be done all within R using a ChIPprofile object from soGGi, which like deepTools takes either a bigWig or a BAM file and quantifies the read signal over genomic intervals. Here we also demonstrate the implementation of EnrichedHeatmap with the profileplyr object so that the entire workflow remains in R.

BamBigwig_to_chipProfile(signalFiles, 
                         testRanges, 
                         format = "bigwig") %>%
  as_profileplyr() %>%
  .[, , c(1:2)] %>%
  annotateRanges_great(species = "mm10") %>% 
  groupBy(group = gene_list_dataframe) %>% 
  generateEnrichedHeatmap(extra_annotation_columns = "hindbrain_liv_stat",
                          sample_names = c("Hindbrain",
                                           "Liver"))

Further, the signal in the ranges can be summarized with the mean (or any other summary statistic) of the entire range using the profileplyr summarize() function. This summarized signal can be visualized with ggplot (assuming the ‘output’ argument of summarize() is set to ‘long’).

import_deepToolsMat(path_to_matrix) %>%
  .[, , c(1,2)] %>%
  annotateRanges_great(species = "mm10") %>% 
  groupBy(group = gene_list_dataframe) %>% 
  summarize(fun = rowMeans, 
            output = "long", 
            keep_all_mcols = TRUE) %>%
  ggplot(aes(x = Sample, y = log2(Signal))) + 
    geom_boxplot() + 
    facet_grid(~GLoverlap_names) + 
    scale_x_discrete(labels= c("Hindbrain", "Liver"))

2 Import signal quantification from deepTools or soGGi

There are two main ways to go from a bigWig of BAM file to a profileplyr object, which is a version of the RangedSummarizedExperiment class within the SummarizedExperiment package. A profileplyr object can be generated from the command line with the output of the deepTools ‘computeMatrix’ function, or from within R using the output of the soGGi regionPlot() function.

  1. Starting with output from deepTools ‘computeMatrix’
  2. Starting with output from soGGi

2.1 Starting with output from deepTools ‘computeMatrix’

If you do do not currently have deepTools installed, instructions for installation are in the deepTools manual, with the easiest method likely being through Bioconda.

This direct output from the ‘computeMatrix’ function can be imported into R as a profileplyr object using the import_deepToolsMat() function. This function takes as its only argument the path to the matrix from deepTools. The information contained within this matrix is stored in a profileplyr object.

library(profileplyr)

example <- system.file("extdata", 
                       "example_deepTools_MAT", 
                       package = "profileplyr") 
proplyrObject <- import_deepToolsMat(example) 
## [1] "Matrix contained 'NA' values, these will be replaced with zeros"

2.2 Starting with output from soGGi

The soGGi function plotRegion() allows for the quantification of signal over genomic intervals within R, and the information from this quantification is stored in a ChIPprofile object. profileplyr can take the ChIPprofile object as an input to generate a profileplyr object using the as_profileplyr() function. While the plotRegion function from soGGi does not allow for inputting multiple signal files or multiple BED files, profileplyr contains a function to do this, BamBigwig_to_ChIPprofile(). This function takes a character vector of paths to bigWig files or BAM files in the ‘signalFiles’ argument. Note that the files for each function call must be all BAMs or all bigWigs and not a combination of the two. The ‘testRanges’ argument is must be a character vector of paths to BED files. The corresponding names for each BED file that will appear on visualizations involving the resulting profileplyr object can be set with the ‘testRanges_names’ argument.

library(soGGi)

signalFiles <- c(system.file("extdata", 
                             "hindbrain1_binned.bw", 
                             package = "profileplyr"), 
                 system.file("extdata", 
                             "liver1_binned.bw", 
                             package = "profileplyr"),
                 system.file("extdata", 
                             "kidney1_binned.bw", 
                             package = "profileplyr"))

testRanges <- system.file("extdata", 
                          "ranges.bed", 
                          package = "profileplyr")

chipProfile <- BamBigwig_to_chipProfile(signalFiles, 
                                        testRanges, 
                                        format = "bigwig")
chipProfile
## class: ChIPprofile 
## dim: 763 3001 
## metadata(3): names AlignedReadsInBam group_boundaries
## assays(3): '' '' ''
## rownames(763): giID197 giID198 ... giID762 giID763
## rowData names(4): name score giID sgGroup
## colnames(3001): Point_Centre-1500 Point_Centre-1499 ...
##   Point_Centre1499 Point_Centre1500
## colData names(0):

Once we have the ChIPprofile object, the as_profileplyr() function will generate the profileplyr object that can be used in the many downstream functions described below. Similar to when you import from deepTools, the grouping of the ranges after using the as_profileplyr() function is stored in the ‘rowGroupsInUse’ section of metadata(proplyrObject). Further, if the object is derived from a soGGi ChIPprofile object, the inherited group column will be called ‘sgGroup’.

proplyrObject <- as_profileplyr(chipProfile)
metadata(proplyrObject)$rowGroupsInUse
## [1] "sgGroup"

While the as_profileplyr() function provides a convenient way to go straight from BAM and bigWig files to profileplyr objects, and allows the user to use any argument that can be passed to the soGGi regionPlot() function, it is possible to directly use the regionPlot() function. This can be done in one line of code using pipes (shown below), however, note that the as_profileplyr() function adds the ability to use multiple BED files or GRanges with one command, which cannot be accomplished in the code below.

library(BiocParallel)

testRanges <- import.bed(system.file("extdata", 
                                 "ranges.bed", 
                                 package = "profileplyr"))

proplyrObject <- bplapply(signalFiles,
                          regionPlot,
                          testRanges = testRanges, 
                          format = "bigwig",
                          style="percentOfRegion") %>% 
  do.call(c,.) %>% 
  as_profileplyr(names = "giID")
## [1] "Matrix contained 'NA' values, these will be replaced with zeros"

3 The profileplyr object

The profileplyr object is a form of the RangedSummarizedExperiment class, which allows us to both store all of the relevant information that is imported from soGGi or deepTools, and have flexibility in how we manipulate this object in downstream analysis. We will go through many of these features by looking at the object we just generated from the import functions.

proplyrObject
## class: profileplyr 
## dim: 763 100 
## metadata(3): info sampleData rowGroupsInUse
## assays(3): hindbrain1_binned liver1_binned kidney1_binned
## rownames: NULL
## rowData names(3): names score dpGroup
## colnames: NULL
## colData names(0):

3.1 Key components of the object

The matrices that represent the bigWig or BAM signal within each bin (bin size specified within deepTools, soGGi by default uses base pair resolution, so bin = 1) are contained in a list within the profileplyr object that can be accessed with the assays() function. The columns of each matrix are the bins, while the rows are the ranges from the BED files that were used as the input to deepTools or soGGi, and the dimensions of each matrix can be seen with dim().

library(SummarizedExperiment)

assays(proplyrObject)
## List of length 3
## names(3): hindbrain1_binned liver1_binned kidney1_binned
dim(proplyrObject)
## [1] 763 100

A key feature of the RangedSummarizedExperiment class is the ability to link the rows of the assay matrices to genomic ranges. Those ranges are stored in a GRanges object allowing for efficient integration with other Bioconductor packages. The GRanges object within the profileplyr object can be accessed with the rowRanges() function. Standard GRanges accessor functions such as start(), end(), or ranges() can be used on the entire profileplyr object. Furthermore, as these ranges are annotated by the various functions of profileplyr, information and classifications of these ranges will be stored in this GRanges object in the metadata columns, which can be accessed as a Dataframe with the mcols() function (this same information is also contained in the rowData section of the object, accessed with rowData()).

rowRanges(proplyrObject)[1:3]
## GRanges object with 3 ranges and 3 metadata columns:
##       seqnames            ranges strand |    names     score  dpGroup
##          <Rle>         <IRanges>  <Rle> | <factor> <numeric> <factor>
##   [1]     chr8 33902396-33903372      * |        .         0    genes
##   [2]    chr17 27678695-27679778      * |     ._r1         0    genes
##   [3]    chr17 23546216-23547490      * |     ._r2         0    genes
##   -------
##   seqinfo: 20 sequences from an unspecified genome; no seqlengths

The information associated with each sample is stored as a Dataframe within with the metadata of the profileplyr object, and is accessed with sampleData().

sampleData(proplyrObject)
## DataFrame with 3 rows and 19 columns
##                    upstream downstream      body  bin.size unscaled.5.prime
##                   <numeric>  <numeric> <numeric> <numeric>        <numeric>
## hindbrain1_binned         0          0      1000        10                0
## liver1_binned             0          0      1000        10                0
## kidney1_binned            0          0      1000        10                0
##                   unscaled.3.prime     sample_labels   verbose bin.avg.type
##                          <numeric>          <factor> <logical>     <factor>
## hindbrain1_binned                0 hindbrain1_binned     FALSE         mean
## liver1_binned                    0     liver1_binned     FALSE         mean
## kidney1_binned                   0    kidney1_binned     FALSE         mean
##                   missing.data.as.zero     scale skip.zeros nan.after.end
##                              <logical> <numeric>  <logical>     <logical>
## hindbrain1_binned                FALSE         1      FALSE         FALSE
## liver1_binned                    FALSE         1      FALSE         FALSE
## kidney1_binned                   FALSE         1      FALSE         FALSE
##                   proc.number sort.regions sort.using ref.point
##                     <numeric>     <factor>   <factor>    <list>
## hindbrain1_binned           1         keep       mean      NULL
## liver1_binned               1         keep       mean      NULL
## kidney1_binned              1         keep       mean      NULL
##                   min.threshold max.threshold
##                          <list>        <list>
## hindbrain1_binned          NULL          NULL
## liver1_binned              NULL          NULL
## kidney1_binned             NULL          NULL

The ‘rowGroupsInUse’ accessor for the metadata indicates which column of the range metadata (mcols() or rowRanges()) that will be used for grouping in the final output if that object were used for visualization. For a profileplyr object created from a deepTools computeMatrix output or from a soGGi ChIPprofile object, the inherited groups correspond to the BED files over which the signal was quantified, and these groups are contained in the ‘dpGroup’ column.

metadata(proplyrObject)$rowGroupsInUse
## [1] "dpGroup"

The ‘mcolToOrderBy’ slot of the metadata indicates which column of the range metadata will be used for ordering the ranges as they are exported to either deepTools or to EnrichedHeatmap (with the generateEnrichedHeatmap() function within profileplyr). This can be set using the orderBy() function, which requires a profileplyr object and a character string matching a column name of the range metadata as arguments. If groupBy is never used on an object, the default will be to order by the mean signal of each range (within each group). In addition, until groupBy() has been used on a profileplyr object, the ‘mcolToOrderBy’ slot of the metadata will be NULL and will not be seen with metadata(proplyrObject).

NOTE: If you are exporting to deepTools and want to order by something other than mean range signal (or other deepTools ordering options), then make sure to set the –sortRegions argument of deepTools ‘plotHeatmap’ to “no” so your custom ordering will be used. generateEnrichedHeatmap() will always use the column indicated by metadata(proplyrObject)$mcolToOrderBy, or the mean signal if that value is NULL.

metadata(proplyrObject)$mcolToOrderBy
## NULL
proplyrObject <- orderBy(proplyrObject, "score")
metadata(proplyrObject)$mcolToOrderBy
## [1] "score"

3.2 Subsetting the profileplyr object

The profileplyr object can be subset either by sample, or by rows and columns of the matrix for each sample. This is done using the ‘[ ]’ brackets, with the first position being assay matrix rows, the second position being assay matrix columns, and the third position being the entire matrix for each sample (in addition to the rest of the metadata and range data). For example if you wanted to get the first ten rows and columns of each sample matrix:

proplyrObject[1:10,1:10]
## class: profileplyr 
## dim: 10 10 
## metadata(3): info sampleData rowGroupsInUse
## assays(3): hindbrain1_binned liver1_binned kidney1_binned
## rownames: NULL
## rowData names(3): names score dpGroup
## colnames: NULL
## colData names(0):

The more useful subsetting functionality is likely the ability to subset by samples using the third position of the bracket. For example, if you only wanted the first two samples to export to a deepTools matrix or for further downstream analysis, you could simply just subset with the third position:

proplyrObject[ , , 1:2]
## class: profileplyr 
## dim: 763 100 
## metadata(3): info sampleData rowGroupsInUse
## assays(2): hindbrain1_binned liver1_binned
## rownames: NULL
## rowData names(3): names score dpGroup
## colnames: NULL
## colData names(0):

3.3 Changing sample names

It might be helpful to change the sample names to something that is shorter to make labeling of figures clearer. This can be done separately in most of the visualization packages that a profileplyr object can be used in, but you can change the names within in the sampleData it will be changed for all downstream analyses. After importing either a deepTools matrix or a soGGi object, the sample names are stored as the rownames of the sampleData Dataframe.

rownames(sampleData(proplyrObject))
## [1] "hindbrain1_binned" "liver1_binned"     "kidney1_binned"

The names can simply be changed by reassigning the rownames of this Dataframe.

rownames(sampleData(proplyrObject)) <- c("Hindbrain",
                                         "Liver",
                                         "Kidney")

Importantly, this also changes the names of the matricies stored in the assays() section of the profileplyr object, and the names used for any output method will also be changed (e.g. deepTools, EnrichedHeatmap, ggplot)

3.4 Connecting profileplyr functions using the pipe (%>%) operator

Code involving profileplyr objects can be made clearer using the pipe operator from the magrittr package. This is aided by the fact that the profileplyr object is both the output and input of most functions within the package. This is demonstrated in the Quick Start section, as well as some of the more complex visualizations later in this vignette. The user can go all the way from importing a deepTools matrix or soGGi object, through annotation and grouping, and ending with export to either the generateEnrichedHeatmap() (shown below) or export_deepToolsMat() function

4 Export/Conversion of profileplyr object for visualization

The profileplyr object allows for the flexibility to be exported to various types of visualization, both inside and outside of R. This flexibility coupled with the annotation and subsetting functions described later in this vignette provide great potential for quickly navigating through data sets. There are two main ways to visualize this data once the data is stored in a profileplyr object, first as a heatmap of signal over the individual bins within these genomic ranges (e.g. deepTools and EnrichedHeatmap), and second as summarized signal (e.g. mean of the entire range) in ggplot or clustered heatmaps (e.g. pheatmap).

For heatmap visualization of the signal over the genomic intervals this object can be converted to:

  1. A matrix that can be used as an input for the ‘plotHeatmap’ function of deepTools
  2. A list of matrices that can be used in the EnrichedHeatmap() function within R
  3. A heatmap containing all range grouping and annotation information (using a modified implementation of the EnrichedHeatmap constructor)

4.1 Export to a deepTools matrix

Any profileplyr object can be exported as a matrix formatted for the deepTools ‘plotHeatmap’ function using the export_deepToolsMat() function. This simply requires the name of the profileplyr object and the path to the desired destination of the gzipped matrix. This function will build the metadata for the deepTools matrix based on the metadata in the profileplyr object. Importantly, the range metadata column of profileplyr object (accessed with rowRanges(object)) that is specified in the ‘rowGroupsInUse’ section of metadata for the whole object (found using metadata(object)$rowGroupsInUse) is automatically used to set the grouping of the exported deepTools matrix. The ‘rowGroupsInUse’ can be changed using the groupBy() function (see more details below).

export_deepToolsMat(proplyrObject, "test_output")

4.2 Convert to an EnrichedHeatmap matrix

To generate a heatmap within R using the EnrichedHeatmap package, a special type of matrix of the ‘normalizedMatrix’ class must be used. The convertToEnrichedHeatmapMat() function takes a profileplyr object as the only required input, and then converts the matrices contained within the assays slot of the profileplyr object to a list of ‘normalizedMatrix’ class objects that can be used in the EnrichedHeatmap() function. See the EnrichedHeatmap vignette for detailed examples for how heatmaps can be concatenated together to visualize all of the data in one figure. Importantly, the metadata stored in the rowRanges slot of the profileplyr object can also be utilized to annotate these heatmaps. NOTE: While this function provides the most flexibility for the user to build a custom set of heatmaps, a more useful function for quick visualization of data stored in a profileplyr object is the generateEnrichedHeatmap() function, which builds these heatmaps with one command using the profileplyr object (see the following section).

library(EnrichedHeatmap)
EH_mat <- convertToEnrichedHeatmapMat(proplyrObject)
EH_mat[[1]]

EnrichedHeatmap(EH_mat[[1]], 
                name = names(EH_mat[1]), 
                column_title = names(EH_mat[1])) +
  EnrichedHeatmap(EH_mat[[2]], 
                  name = names(EH_mat[2]), 
                  column_title = names(EH_mat[2])) +
    EnrichedHeatmap(EH_mat[[3]], 
                  name = names(EH_mat[3]), 
                  column_title = names(EH_mat[3]))

# heatmap not included

4.3 Directly generate a customized EnrichedHeatmap with annotated ranges

profileplyr also contains the generateEnrichedHeatmap() function, which takes a profileplyr object and produces an EnrichedHeatmap that can be easily modified to incorporate metadata from the object. So far the profileplyr object used in this vignette is relatively basic (no groups or additional metadata), but this function generates a multipanel heatmap with a variety of arguments that have been tailored to visualizing the profileplyr object. As we explore more functionality within profileplyr, some of these features will be demonstrated.

By default, if the ranges are divided into multiple groups (e.g. multiple BED files were input) the heatmap will automatically be divided and annotated to reflect these groups. Here we have no groups so we can set the ‘include_group_annotation’ argument to be FALSE. If this were set to TRUE, then colored annotation bars would exist to the left of the heatmaps. The maximum value for the y-axis in the line plots at the top of each heatmap are also automatically set based on the highest mean range signal from all of the samples. This can be set manually with the ‘ylim’ argument, or if ylim = NULL, the maximum will be inferred (i.e. be different) for each individual heatmap.

generateEnrichedHeatmap(proplyrObject, include_group_annotation = FALSE)

4.4 Summarizing signal for ggplot or heatmap visualization

Oftentimes it will be important to use some kind of summary statistic for the entire range when trying to understand and visualize differences between samples. This is often going to be done using the summarize() function. The summarize function requires the name of a profileplyr object, the function used to summarize the ranges (e.g. rowMeans or rowMax), and the type of output. Here the basic options will be demonstrated with the example data which contains no groups, however, note how it is used later in the vignette or in the quick start when additional grouping options are discussed and summarizing the data in this way becomes especially useful.

4.4.1 Matrix output with the summarize() function

If the ‘output’ argument is set to ‘matrix’, then only a matrix will be returned with a single column for each sample containing the bins summarized as indicated with the ‘fun’ argument. The row names of this matrix is a unique identifier for each range containing the chromosome, start, end, and group.

object_sumMat <- summarize(proplyrObject, 
                           fun = rowMeans, 
                           output = "matrix") 
object_sumMat[1:3, ]
##                               Hindbrain      Liver    Kidney
## chr8_33902396_33903372_genes  0.1878635 0.11450865 0.3375368
## chr17_27678695_27679778_genes 0.6568317 0.08386143 0.1087628
## chr17_23546216_23547490_genes 0.5972133 1.05301136 0.6422771

This matrix can be used directly in other heatmap generating packages, including pheatmap.

library(pheatmap)
pheatmap(object_sumMat, 
         scale = "row", 
         show_rownames = FALSE)

4.4.2 Long output for ggplot with the summarize() function

If the ‘output’ argument is set to ‘long’, then the output will be a long data frame that can be used for plotting with ggplot. The grouping column of the range metadata as specified by ‘metadata(proplyrObject)$rowGroupsInUse’ will automatically be included in the data frame. If the other range metadata columns should be included in the data frame, then the ‘keep_all_mcols’ argument should be set to TRUE. Additionally, columns specifying the range, as well as the sample and the summarized signal that correspond to that range are included by default.

proplyrObject_long <- summarize(proplyrObject, 
                                fun = rowMeans, 
                                output = "long") 
proplyrObject_long[1:3, ]
##   dpGroup         combined_ranges    Sample    Signal
## 1   genes  chr8_33902396_33903372 Hindbrain 0.1878635
## 2   genes chr17_27678695_27679778 Hindbrain 0.6568317
## 3   genes chr17_23546216_23547490 Hindbrain 0.5972133

This data frame can then be used directly with ggplot for plotting.

It is often helpful to log transform the signal to more clearly see trends in the signal that is quantified.

ggplot(proplyrObject_long, aes(x = Sample, y = log(Signal))) + 
       geom_boxplot()

4.4.3 profileplyr object output with the summarize() function

Lastly, if the ‘output’ argument is set to ‘object’, then a profileplyr object containing the summarized matrix will be returned. This will allow for further grouping or manipulation of the summarized ranges with other profileplyr functions, as opposed to using the binned ranges that are often used in the examples below.

proplyrObject_summ <- summarize(proplyrObject, 
                                fun = rowMeans, 
                                output = "object")
assays(proplyrObject_summ)[[1]][1:3, ]
##      Hindbrain      Liver    Kidney
## [1,] 0.1878635 0.11450865 0.3375368
## [2,] 0.6568317 0.08386143 0.1087628
## [3,] 0.5972133 1.05301136 0.6422771
rowRanges(proplyrObject_summ)[1:3]
## GRanges object with 3 ranges and 3 metadata columns:
##       seqnames            ranges strand |    names     score  dpGroup
##          <Rle>         <IRanges>  <Rle> | <factor> <numeric> <factor>
##   [1]     chr8 33902396-33903372      * |        .         0    genes
##   [2]    chr17 27678695-27679778      * |     ._r1         0    genes
##   [3]    chr17 23546216-23547490      * |     ._r2         0    genes
##   -------
##   seqinfo: 20 sequences from an unspecified genome; no seqlengths

5 Annotating of genomic ranges with clusters, genomic regions, and genes

5.1 K-means and hierarchical clustering of the genomic ranges

Clustering of the signal across the genomic intervals of interest can shed light on patterns that aren’t immediately apparent when looking at the data as a whole. The clusterRanges() function provides a framework to cluster ranges that are contained in a profileplyr object. It should be noted that clustering can also be performed outside of R within the deepTools ‘plotHeatmap’ function, however, profileplyr clustering allows for integration of this clustering with further grouping mechanisms and visualizations within R and profileplyr.

clusterRanges() takes a profileplyr object as its first argument as well as a function to summarize the signal in each range (similar to the summarize() function above). The pheatmap package is then used to cluster the ranges, and the type of clustering depends on whether the user inputs a value for ‘kmeans_k’ (for kmeans clustering) or ‘cutree_rows’ (for hierarchical clustering using hclust). An integer value entered for either of these arguments will specify the number of clusters used for each method. If both ‘kmeans_k’ or ‘cutree_rows’ are left as NULL (default), then a heatmap will be printed with hierarchical clustering, but no distinct clusters defined, and no profileplyr object will be returned. This might be helpful as an initial and quick look at the ranges or as a means to determine the number of clusters to try.

clusterRanges(proplyrObject, 
              fun = rowMeans)

# this code prints heatmap (does not return), but heatmap not shown here to save space

If an integer is entered for either ‘kmeans_k’ or ‘cutree_rows’, then a profileplyr object will be returned with a cluster assigned to each range, and a column is added to the range metadata with the cluster for each range. Further, the ‘rowGroupsInUse’ is changed to this column, meaning that if this object is exported as a deepTools matrix with the export_deepToolsMat() function, the heatmap generated by ‘plotHeatmap’ will be grouped by the clusters.

set.seed(0)
kmeans <- clusterRanges(proplyrObject, 
                        fun = rowMeans, 
                        kmeans_k = 4)
## [1] "K means clustering used."
## [1] "A column has been added to the range metadata with the column name 'cluster', and the 'rowGroupsInUse' has been set to this column."
rowRanges(kmeans)[1:3]
## GRanges object with 3 ranges and 4 metadata columns:
##       seqnames            ranges strand |    names     score  dpGroup
##          <Rle>         <IRanges>  <Rle> | <factor> <numeric> <factor>
##   [1]     chr8 33902396-33903372      * |        .         0    genes
##   [2]    chr17 27678695-27679778      * |     ._r1         0    genes
##   [3]    chr17 23546216-23547490      * |     ._r2         0    genes
##         cluster
##       <ordered>
##   [1]         4
##   [2]         3
##   [3]         1
##   -------
##   seqinfo: 20 sequences from an unspecified genome; no seqlengths
metadata(kmeans)$rowGroupsInUse
## [1] "cluster"

To visually inspect the clusters, the heatmap can also be printed (though it will not be returned) if the ‘silent’ argument is set to FALSE. A profileplyr object will still be returned even if silent is set to FALSE.

Kmeans clustering:

set.seed(0)
kmeans <- clusterRanges(proplyrObject, 
                        fun = rowMeans, 
                        kmeans_k = 4, 
                        silent = FALSE)
## [1] "K means clustering used."

## [1] "A column has been added to the range metadata with the column name 'cluster', and the 'rowGroupsInUse' has been set to this column."

Hierarchical Clustering:

hclust <- clusterRanges(proplyrObject, 
                        fun = rowMeans, 
                        cutree_rows = 4, 
                        silent = FALSE)
## [1] "Hierarchical clustering used. It is advised to avoid this option with large matrices as the clustering can take a long time. Kmeans is more suitable for large matrices."

## [1] "A column has been added to the range metadata with the column name 'cluster', and the 'rowGroupsInUse' has been set to this column."

5.2 Output matrix with cluster information to deepTools

The profileplyr object that is the output of clusterRanges() can be exported as a deepTools matrix, which can then be used as an input for plotHeatmap.

export_deepToolsMat(kmeans, "kmeans_cluster_mat")

This matrix can then be passed directly to ‘plotHeatmap’, either by using the terminal or by using the system() function in R.

Code from command line :

plotHeatmap -m kmeans_cluster_mat.gz -o kmeans_cluster_mat.jpg –samplesLabel Hindbrain Liver Kidney –startLabel start –endLabel end –xAxisLabel “distance (bp)”

5.3 Generate group-annotated heatmap in R directly with generateEnrichedHeatmap()

After clustering the profileplyr object can be passed directly as an argument into the generateEnrichedHeatmap() function, and by default the heatmap will be grouped and annotated by these clusters, which were automatically set as the ‘rowGroupsInUse’ in the clusterRanges() function. Assuming that the ‘include_group_annotation’ argument of this function is set to the default value of TRUE, whichever metadata column is set to the ‘rowGroupsInUse’ slot will be used for the grouping and annotation seen below.

generateEnrichedHeatmap(kmeans)

5.4 Visualize mean range signal for each cluster with ggplot

An alternative way to visualize the clusters is by summarizing the signal over each range and then plotting those means by cluster. The flexibility with grouping and annotating within profileplyr makes this relatively easy. This is also a good demonstration of using piped code, going all the way from import to clustering, then to summarizing by mean range signal, and finishing with ggplot visualization:

library(magrittr)

set.seed(0)
import_deepToolsMat(example) %>%
  clusterRanges(fun = rowMeans, 
                kmeans_k = 4, 
                silent = TRUE) %>% 
  summarize(fun = rowMeans, 
            output = "long") %>% 
  ggplot(aes(x = Sample, y = log(Signal))) + 
         geom_boxplot() + 
         facet_grid(~cluster) + 
         theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
         scale_x_discrete(labels= c("Hindbrain", "Liver", "Kidney"))

5.5 Gene annotation of ranges

Understanding the genes that are in the proximity to a particular set of ranges allows for potential understanding of functional consequences of signal patterns within these ranges. By annotating the ranges within a profileplyr object with genes, the user can connect signal of particular samples to specific genes and the functions attributed to those genes. profileplyr provides two methods of annotating the ranges within the object using two established Bioconductor packages, ChIPseeker and rGREAT.

5.5.1 Annotation of ranges with genes and genomic regions using ChIPseeker

The annotateRanges() function passes the ranges of a profileplyr object to the annotatePeak() function of ChIPseeker, and then integrates the information from this output into the profileplyr object. The annotation of the ranges, including the annotation type (promoter, exon, etc) and closest gene, are then compiled in the range metadata. In addition to the profileplyr object, a TxDb object must be specified with the ‘TxDb’ argument. The TxDb argument can be one of three things:

  1. a character string that is either “hg19”, “hg38”, “mm9”, or “mm10”, which will automatically retrieve the relevant TxDb object
  2. a TxDb object
  3. a character string that is a path to a GTF or GFF file, in which case a TxDb object will be made using the makeTxDbFromGFF() function from the Genomic Features package.

Note that the default action of this function will not change the ‘rowGroupsInUse’ slot of the metadata. However, if changeGroupToAnnotation = TRUE, a newly generated metadata column will be added that combines the inherited grouping of the ranges and the genomic annotation that was determined by ChIPseeker. If this object is exported as a deepTools matrix, the heatmap will group the ranges first by the inherited grouping, followed by the annotation groups (e.g. promoter, intron, etc). This can be modified by setting the ‘heatmap_grouping’ argument to ‘annotation’, which will force the ranges to be grouped first by annotation, followed by the inherited grouping.

anno <- annotateRanges(proplyrObject, 
                       TxDb = "mm10")
## >> preparing features information...      2019-03-29 01:03:18 
## >> identifying nearest features...        2019-03-29 01:03:19 
## >> calculating distance from peak to TSS...   2019-03-29 01:03:19 
## >> assigning genomic annotation...        2019-03-29 01:03:19 
## >> adding gene annotation...          2019-03-29 01:03:28 
## >> assigning chromosome lengths           2019-03-29 01:03:29 
## >> done...                    2019-03-29 01:03:29
rowRanges(anno)[1:3]
## GRanges object with 3 ranges and 16 metadata columns:
##       seqnames            ranges strand |    names     score  dpGroup
##          <Rle>         <IRanges>  <Rle> | <factor> <numeric> <factor>
##   [1]     chr8 33902396-33903372      * |        .         0    genes
##   [2]    chr17 27678695-27679778      * |     ._r1         0    genes
##   [3]    chr17 23546216-23547490      * |     ._r2         0    genes
##                                      annotation   geneChr geneStart
##                                     <character> <integer> <integer>
##   [1]  Intron (uc009lkk.2/19663, intron 1 of 8)         8  33782644
##   [2]  Intron (uc008bpj.2/23969, intron 1 of 9)        17  27685239
##   [3] Exon (uc033hbl.1/uc033hbl.1, exon 1 of 1)        17  23550799
##         geneEnd geneLength geneStrand      geneId transcriptId distanceToTSS
##       <integer>  <integer>  <integer> <character>  <character>     <numeric>
##   [1]  33929863     147220          2       19663   uc009lkk.2         26491
##   [2]  27711117      25879          1       23969   uc008bpm.2         -5461
##   [3]  23561634      10836          1      320020   uc056zdm.1         -3309
##                  ENSEMBL        SYMBOL
##              <character>   <character>
##   [1] ENSMUSG00000031586         Rbpms
##   [2] ENSMUSG00000040276       Pacsin1
##   [3] ENSMUSG00000117322 6330415G19Rik
##                                                        GENENAME
##                                                     <character>
##   [1]           RNA binding protein gene with multiple splicing
##   [2] protein kinase C and casein kinase substrate in neurons 1
##   [3]                                RIKEN cDNA 6330415G19 gene
##       annotation_short
##              <ordered>
##   [1]           Intron
##   [2]           Intron
##   [3]             Exon
##   -------
##   seqinfo: 20 sequences from an unspecified genome; no seqlengths

Lastly, if only a subset of annotation types are desired (e.g. just those ranges within promoters) then the ‘annotation_subset’ argument can be used and the profileplyr object will be subset accordingly.

5.5.1.1 Adding metadata information beyond the grouping column to the EnrichedHeatmap

The EnrichedHeatmap package brings a lot of heatmap annotation options, and the profileplyr generateEnrichedHeatmap() function takes advantage of this by giving the user the ability to add both categorical and numeric annotations on the right side of the matrix heatmaps. For example, the ChIPseeker annotations added to the range metadata in the previous example can be visualized. Adding metadata columns is done using the ‘extra_annotation_columns’ argument of generateEnrichedHeatmap(). This argument takes a character vector with strings that match column names in the range metadata, with no limit on the number of columns that can be visualized. Doing this below, it immediately becomes clear that many of the ranges with the highest signal are in promoters, which is expected for ATAC-seq signal.

generateEnrichedHeatmap(anno, 
                        extra_annotation_columns = "annotation_short",
                        include_group_annotation = FALSE)

In addition to being exported to a deepTools matrix or Enrichedheatmap, these annotations from annotateRanges() can be further utilized with the groupBy() function, discussed below.

5.5.2 Annotation of ranges with genes using GREAT

Another common method of annotating genomic intervals with nearby genes is by using the GREAT method, which defines regulatory regions for each gene and then determines whether a certain genomic interval overlaps with that genic regulatory region. The profileplyr function annotateRanges_great() takes a profileplyr object and the genome that is to be used for the annotation, and will add a column to the range metadata to indicate whether that range overlaps with the regulatory region of a gene. If a region is in regulatory regions of multiple genes, then it will have multiple rows within the profileplyr object. This function will not change the grouping of the profileplyr object.

anno_great <- annotateRanges_great(proplyrObject, 
                                   species = "mm10")
rowRanges(anno_great)[1:3]
## GRanges object with 3 ranges and 5 metadata columns:
##       seqnames            ranges strand |    names     score  dpGroup
##          <Rle>         <IRanges>  <Rle> | <factor> <numeric> <factor>
##   [1]     chr1 14181321-14182231      * |   ._r487         0    genes
##   [2]     chr1 14181321-14182231      * |   ._r487         0    genes
##   [3]     chr1 16201828-16202701      * |   ._r287         0    genes
##            SYMBOL distanceToTSS
##       <character>     <numeric>
##   [1]        Eya1        128423
##   [2]        Xkr9        513005
##   [3]       Rdh10         96491
##   -------
##   seqinfo: 20 sequences from an unspecified genome; no seqlengths

6 Grouping ranges by range metadata, gene list, or additional GRanges

The groupBy() function within profileplyr allows for a range of grouping mechanisms. The groupBy() function always requires a profileplyr object and the ‘group’ argument, which will determine how the ranges are to be grouped. There are three options for grouping:

  1. Switch output grouping columns within existing range metadata
  2. Group by user-supplied GRanges
  3. Group by user-supplied gene list

6.1 Switch output grouping columns within existing range metadata

If the ‘group’ argument is a character string then it must match a name of one of the columns in mcols(proplyrObject). In this case the groupBy function will change the column that will be used for grouping a profileplyr object during visualization. While many of the profileplyr functions shown above will set the grouping column (specified in the ‘rowGroupsInUse’ section of metadata(proplyrObject)) to the appropriate column for that function, the user may want to use another column for the grouping of the deepTools matrix to be output. The user may have added an additional column to the range metadata that can be used, or a column that is generated in one of the above functions, but is not the default grouping column, might be useful. For example, the default grouping for the annotateRanges() function is a combination of the inherited groups and the annotations, however, maybe the user wants to disregard the inherited grouping of ranges and group all of the ranges by the new annotation. In this case they would change the grouping to the ‘annotation_short’ column, as shown below.

anno <- annotateRanges(proplyrObject,
                       TxDb = "mm10")
metadata(anno)$rowGroupsInUse
## [1] "dpGroup"
anno_switchGroup <- groupBy(anno, 
                            group = "annotation_short")
metadata(anno_switchGroup)$rowGroupsInUse
## [1] "annotation_short"

6.2 Group by user-supplied GRanges

If the ‘group’ argument is a GRanges or a GRangesList then the overlap between each GRanges in this list and the ranges of the profileplyr object will be determined. A column will be added to the range metadata (column name is ‘group_and_overlap’) that shows a combination of the inherited group and whether the range overlaps with any of the GRanges. The ‘rowGroupsInUse’ will be changed to this column by default. If the inherit_groups argument is set to FALSE, then only the overlaps with the GRanges will determine the grouping.

One consideration with this function is what should be done with ranges that overlap multiple GRanges. The default behavior of this function is to duplicate the range and place it in each overlap group for each GRanges. If separateDuplicated = TRUE, then separate groups will be generated for the ranges that overlap multiple GRanges so that every range is in the resulting profileplyr object exactly once. This option is discouraged with high numbers of input GRanges, as the combinations of overlapping groups can get quite large. Lastly, the non-overlapping regions can be included in the heatmap if the ‘include_nonoverlapping’ argument is set to TRUE.

Here we import BED files containing the top 5000 H3K27ac peaks from hindbrain and liver using the rtracklayer package. These peaks sets were adapted from BED files that were downloaded from ENCODE using the provided links. The GRanges are then combined into a GRangesList to be used in the ‘group’ argument for the groupBy() function.

library(rtracklayer)

# import bed files as GRanges
hindbrain_K27ac <- import.bed(system.file("extdata", 
                                          "K27ac_hindbrain_top5000.bed", 
                                          package = "profileplyr") )
liver_K27ac <- import.bed(system.file("extdata", 
                                      "K27ac_liver_top5000.bed", 
                                      package = "profileplyr") )
# make GRangesList
K27ac_GRlist <- GRangesList(hindbrain_K27ac, 
                            liver_K27ac)
K27ac_GRlist_names <- c("hindbrain_K27ac", 
                        "liver_K27ac")

# subset the profileplyr object to just hindbrain and liver, 
# and group by GRanges
K27ac_groupByGR <- proplyrObject[,,grepl("Hindbrain|Liver", 
                                         names(assays(proplyrObject)))] %>%
                   groupBy(group = K27ac_GRlist,
                           GRanges_names = K27ac_GRlist_names, 
                           include_nonoverlapping = FALSE,
                           inherit_groups = FALSE)
rowRanges(K27ac_groupByGR)[1:3]
## GRanges object with 3 ranges and 8 metadata columns:
##       seqnames            ranges strand |    names     score  dpGroup
##          <Rle>         <IRanges>  <Rle> | <factor> <numeric> <factor>
##   [1]    chr16 23520075-23521344      * |     ._r4         0    genes
##   [2]    chr17 55445046-55446488      * |    ._r15         0    genes
##   [3]    chr16 77399830-77401368      * |    ._r21         0    genes
##       overlap_matrix GRoverlap_names   group_and_GRoverlap        name
##             <matrix>       <ordered>             <ordered> <character>
##   [1]            1:0 hindbrain_K27ac genes_hindbrain_K27ac        <NA>
##   [2]            1:0 hindbrain_K27ac genes_hindbrain_K27ac        <NA>
##   [3]            1:0 hindbrain_K27ac genes_hindbrain_K27ac        <NA>
##         score.1
##       <numeric>
##   [1]       132
##   [2]       145
##   [3]       194
##   -------
##   seqinfo: 20 sequences from an unspecified genome; no seqlengths

As with any profileplyr object, this can be exported to deepTools ‘plotHeatmap’ using the profileplyr function export_deepToolsMat()

export_deepToolsMat(K27ac_groupByGR, 
                    con = "K27ac_GRoverlap.gz")

This matrix can then be passed directly to ‘plotHeatmap’, either by using the terminal or by using the system() function in R.

Code from command line:

plotHeatmap -m K27ac_GRoverlap.gz -o K27ac_GRoverlap.jpg –samplesLabel Hindbrain Liver Kidney –startLabel start –endLabel end –xAxisLabel “distance (bp)”

The profileplyr object can also be passed directly to the generateEnrichedHeatmap() function to be visualized in R.

generateEnrichedHeatmap(K27ac_groupByGR)

6.3 Group by user-supplied gene list

If the ‘group’ argument is a list (i.e. class(group) returns “list”), then it is assumed that this list contains lists of gene sets. These gene sets within this list can either be character vectors of gene symbols, or data frames with the gene symbols as rownames. Importantly, if all of the elements of the list are data frames AND they all have the same columns (as determined by having matching column names), then those columns will be added to the range metadata to potentially be used to annotate the ranges. This is especially useful for annotating the ranges with additional gene-level metadata (e.g. gene expression). However, if any of the elements of the list are character vectors, or if they are data frames with columns that are different, then the additional columns will not be included in the range metadata and only the overlap information will be stored.

The gene symbols will be used to determine overlap with the genes that are associated with each range of the profileplyr object. This type of grouping will typically follow one of the annotation functions: annotateRanges() or annotateRanges_great(). The names of the list elements (names(list)) will be used as the names of each set of genes in the range metadata and any exported deepTools matrix. A new column will be added to the range metadata that shows a combination of the inherited group and whether the range overlaps with any of the gene sets.

6.3.1 Gene set list contains a character data

First we read in a list of gene sets that are character vectors. This list contains the top 1000 differentially expressed genes that are higher in the liver compared to hindbrain, and the top 1000 that are higher in the hindbrain compared to the liver.

gene_list_hindbrain_liv <- system.file("extdata", 
                                            "gene_list_hindbrain_liv.RData", 
                                            package = "profileplyr") 

gene_list_character <- readRDS(gene_list_hindbrain_liv) 
names(gene_list_character)
## [1] "upInHindbrain_VS_Liver"   "downInHindbrain_VS_Liver"
head(gene_list_character[[1]])
## [1] "Tuba1a" "Agap1"  "Nnat"   "Col2a1" "Rtn1"   "Map1b"

Then we subset the profileplyr object to only include the signal data for hindbrain and liver. This is followed by gene annotation of these ranges using the ‘annotateRanges_great’ function which utilizes the rGREAT package. The resulting profileplyr object has range metadata columns that have gene annotation (‘SYMBOL’ column) and a column with the gene set that each range overlapped combined with the inherited grouping (‘group_and_overlap’ column).

proplyrObject_HB_Liv <- proplyrObject[ , , grepl("Hindbrain|Liver", 
                                                 names(assays(proplyrObject)))]
proplyrObject_HB_Liv <- annotateRanges_great(proplyrObject_HB_Liv, 
                                             species = "mm10") 

proplyrObject_HB_Liv <- groupBy(proplyrObject_HB_Liv, 
                                group = gene_list_character)
rowRanges(proplyrObject_HB_Liv)[1:3]
## GRanges object with 3 ranges and 8 metadata columns:
##       seqnames            ranges strand |    names     score  dpGroup
##          <Rle>         <IRanges>  <Rle> | <factor> <numeric> <factor>
##   [1]     chr1 16201828-16202701      * |   ._r287         0    genes
##   [2]     chr1 20867186-20868863      * |   ._r211         0    genes
##   [3]     chr1 24107322-24109025      * |    ._r88         0    genes
##            SYMBOL distanceToTSS overlap_matrix        GLoverlap_names
##       <character>     <numeric>       <matrix>              <ordered>
##   [1]       Stau2        317041            1:0 upInHindbrain_VS_Liver
##   [2]       Paqr8        -22597            1:0 upInHindbrain_VS_Liver
##   [3]      Col9a1        -69436            1:0 upInHindbrain_VS_Liver
##                group_and_GLoverlap
##                          <ordered>
##   [1] genes_upInHindbrain_VS_Liver
##   [2] genes_upInHindbrain_VS_Liver
##   [3] genes_upInHindbrain_VS_Liver
##   -------
##   seqinfo: 20 sequences from an unspecified genome; no seqlengths

We can export this to deepTools:

export_deepToolsMat(proplyrObject_HB_Liv, 
                    con = "great_groupByGene_HB_Liv")

This matrix can then be passed directly to ‘plotHeatmap’, either by using the terminal or by using the system() function in R . Code from command line:

“plotHeatmap -m great_groupByGene_HB_Liv.gz -o great_groupByGene_HB_Liv.jpg” –startLabel start –endLabel end –xAxisLabel “distance (bp)”

6.3.2 Gene set list with data frames for heatmap annotation (using pipe - %>%)

This time we will use the same list of genes as before, but this time they are in data frames with a column containing gene expression changes for hindbrain vs liver (they are values from the ‘stat’ column of a DESeq2 results table). This allows us to add a heatmap to our visualization showing these expression changes. Also note that the generateEnrichedHeatmap() function gives the ability to manually change the sample names (previously we had shortened them by changing the rownames of sampleData(proplyrObject))

gene_list_hindVliver_df <- system.file("extdata", 
                                       "gene_list_hindbrain_liv_df.RData", 
                                       package = "profileplyr") 

gene_list_dataframe <- readRDS(gene_list_hindVliver_df) 
names(gene_list_dataframe)
## [1] "upInHindbrain_VS_Liver"   "downInHindbrain_VS_Liver"
head(gene_list_dataframe[[1]])
##        hindbrain_liv_stat
## Tuba1a           75.80852
## Agap1            53.32196
## Nnat             51.68476
## Col2a1           49.94588
## Rtn1             48.51060
## Map1b            44.71733
example <- system.file("extdata", 
                       "example_deepTools_MAT", 
                       package = "profileplyr")
# NOTE: could also use BamBigwig_to_chipProfile() and as_profileplyr()
# within R to start from bigwig/BAM and BED file. 

import_deepToolsMat(example) %>%
  .[ , , grepl("hindbrain|liver", names(assays(.)))] %>%
  annotateRanges_great(species = "mm10") %>%
  groupBy(group = gene_list_dataframe,
          inherit_groups = FALSE) %>%
  generateEnrichedHeatmap(extra_annotation_columns="hindbrain_liv_stat",
                          sample_names = c("Hindbrain", 
                                           "Liver"),
                          extra_anno_width = 8)

7 Combining multiple profileplyr functions for heatmap annotation

As a demonstration, we can combine many of the profileplyr functions and visualize more than one metadata column. From this figure we can see a number of interesting things from the data. Many of the regions with the highest ATAC-seq signal are in promoters for all clusters (the ‘annotation_short’ column). Cluster 2 appears to have many ranges with specific signal in the liver sample which is supported by the main heatmap, the high density of top liver K27ac peaks (red signal in the ‘GRoverlap_names’ column), and the high density of genes that have reduced gene expression in the liver vs the hindbrain (red signal in the ‘stat’ column). On the other hand Cluster 1 has specific hindbrain peaks, high overlap with hindbrain K27ac peaks, and higher expression of genes that have higher expression in the hindbrain. By extracting all of this information from the profileplyr object and displaying in a heatmap with one command, a lot can be gleaned from these data.

# NOTE: could also use BamBigwig_to_chipProfile() and as_profileplyr()
# within R to start from bigwig/BAM and BED file. 

set.seed(0)
import_deepToolsMat(example) %>%
  annotateRanges(TxDb = "mm10") %>% # annotate with ChIPseeker
  groupBy(group = gene_list_dataframe,
          include_nonoverlapping = TRUE,
          inherit_groups = FALSE) %>% # group by gene overlap
  groupBy(group = K27ac_GRlist,
          GRanges_names = K27ac_GRlist_names,
          include_nonoverlapping = TRUE,
          inherit_groups = FALSE) %>% # group by GRanges
  clusterRanges(kmeans_k = 3) %>%
  generateEnrichedHeatmap(extra_annotation_columns=c("annotation_short", 
                                                       "GRoverlap_names", 
                                                       "hindbrain_liv_stat"),
                          sample_names = c("Hindbrain",
                                           "Liver",
                                           "Kidney"),
                          extra_anno_width = c(5,5,8))

The generateEnrichedHeatmap() function allows for a great deal of customization of the heatmap that is built around the profileplyr object. Below we modify colors. The color of the heatmaps from the assay matrices is changed to a two color spectrum, and changes to the annotation heatmaps were made as well.

The color arguments for both the matrices and the extra annotation columns can either be a character vector if all heatmaps should have the same color scheme, or a list if each individual heatmap requires different colors. If a list is used, the list must be the same length as the number of heatmaps, and each element of the list maps to the corresponding matrix in the profileplyr object (for ‘matrices_color’) or to the corresponding element in the ‘extra_annotation_columns’ argument (for ‘extra_anno_color’). If the color of a particular heatmap is to be left unchanged, then that element of the list should be NULL.

# NOTE: could also use BamBigwig_to_chipProfile() and as_profileplyr()
# within R to start from bigwig/BAM and BED file. 

set.seed(0)
import_deepToolsMat(example) %>% 
  annotateRanges(TxDb = "mm10") %>% # annotate with ChIPseeker
  groupBy(group = gene_list_dataframe,
          include_nonoverlapping = TRUE,
          inherit_groups = FALSE) %>% # group by gene overlap
  groupBy(group = K27ac_GRlist,
          GRanges_names = K27ac_GRlist_names,
          include_nonoverlapping = TRUE,
          inherit_groups = FALSE) %>% # group by GRanges
  clusterRanges(kmeans_k = 3) %>%
  generateEnrichedHeatmap(extra_annotation_columns=c("annotation_short", 
                                                       "GRoverlap_names", 
                                                       "hindbrain_liv_stat"),
                          sample_names = c("Hindbrain",
                                           "Liver",
                                           "Kidney"),
                          extra_anno_width = c(5,5,8),
                          matrices_color = c("white", "red"),
                          extra_anno_color = list(NULL,
                                                  c("black",
                                                    "red", 
                                                    "green",
                                                    "white"),
                                                  c("#e7e1ef", 
                                                    "#c994c7", 
                                                    "#dd1c77")
                                                  ))

8 Session info

sessionInfo()
## R Under development (unstable) (2019-03-18 r76245)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.4
## 
## Matrix products: default
## BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
##  [1] grid      parallel  stats4    stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] org.Mm.eg.db_3.7.0          AnnotationDbi_1.45.1       
##  [3] pheatmap_1.0.12             EnrichedHeatmap_1.13.2     
##  [5] ComplexHeatmap_1.99.7       soGGi_1.15.0               
##  [7] SummarizedExperiment_1.13.0 DelayedArray_0.9.8         
##  [9] BiocParallel_1.17.17        matrixStats_0.54.0         
## [11] Biobase_2.43.1              ggplot2_3.1.0              
## [13] rtracklayer_1.43.3          GenomicRanges_1.35.1       
## [15] GenomeInfoDb_1.19.2         IRanges_2.17.4             
## [17] S4Vectors_0.21.20           BiocGenerics_0.29.2        
## [19] magrittr_1.5                BiocStyle_2.11.0           
## [21] rmarkdown_1.12              profileplyr_0.8.9          
## 
## loaded via a namespace (and not attached):
##   [1] ChIPseeker_1.19.1                       
##   [2] circlize_0.4.5                          
##   [3] fastmatch_1.1-0                         
##   [4] plyr_1.8.4                              
##   [5] igraph_1.2.4                            
##   [6] lazyeval_0.2.2                          
##   [7] splines_3.6.0                           
##   [8] gridBase_0.4-7                          
##   [9] urltools_1.7.2                          
##  [10] digest_0.6.18                           
##  [11] htmltools_0.3.6                         
##  [12] GOSemSim_2.9.1                          
##  [13] viridis_0.5.1                           
##  [14] GO.db_3.7.0                             
##  [15] gdata_2.18.0                            
##  [16] memoise_1.1.0                           
##  [17] cluster_2.0.7-1                         
##  [18] Biostrings_2.51.5                       
##  [19] R.utils_2.8.0                           
##  [20] enrichplot_1.3.2                        
##  [21] prettyunits_1.0.2                       
##  [22] colorspace_1.4-1                        
##  [23] blob_1.1.1                              
##  [24] ggrepel_0.8.0                           
##  [25] xfun_0.5                                
##  [26] dplyr_0.8.0.1                           
##  [27] crayon_1.3.4                            
##  [28] RCurl_1.95-4.12                         
##  [29] jsonlite_1.6                            
##  [30] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 
##  [31] chipseq_1.33.3                          
##  [32] glue_1.3.1                              
##  [33] polyclip_1.10-0                         
##  [34] gtable_0.3.0                            
##  [35] zlibbioc_1.29.0                         
##  [36] XVector_0.23.2                          
##  [37] UpSetR_1.3.3                            
##  [38] GetoptLong_0.1.7                        
##  [39] TxDb.Hsapiens.UCSC.hg38.knownGene_3.4.0 
##  [40] shape_1.4.4                             
##  [41] scales_1.0.0                            
##  [42] DOSE_3.9.4                              
##  [43] DBI_1.0.0                               
##  [44] TxDb.Mmusculus.UCSC.mm9.knownGene_3.2.2 
##  [45] Rcpp_1.0.1                              
##  [46] plotrix_3.7-4                           
##  [47] viridisLite_0.3.0                       
##  [48] progress_1.2.0                          
##  [49] clue_0.3-57                             
##  [50] gridGraphics_0.3-0                      
##  [51] bit_1.1-14                              
##  [52] europepmc_0.3                           
##  [53] preprocessCore_1.45.0                   
##  [54] httr_1.4.0                              
##  [55] fgsea_1.9.5                             
##  [56] gplots_3.0.1.1                          
##  [57] RColorBrewer_1.1-2                      
##  [58] pkgconfig_2.0.2                         
##  [59] XML_3.98-1.19                           
##  [60] R.methodsS3_1.7.1                       
##  [61] farver_1.1.0                            
##  [62] locfit_1.5-9.1                          
##  [63] labeling_0.3                            
##  [64] ggplotify_0.0.3                         
##  [65] tidyselect_0.2.5                        
##  [66] rlang_0.3.2                             
##  [67] reshape2_1.4.3                          
##  [68] munsell_0.5.0                           
##  [69] tools_3.6.0                             
##  [70] RSQLite_2.1.1                           
##  [71] ggridges_0.5.1                          
##  [72] evaluate_0.13                           
##  [73] stringr_1.4.0                           
##  [74] yaml_2.2.0                              
##  [75] knitr_1.22                              
##  [76] org.Hs.eg.db_3.7.0                      
##  [77] bit64_0.9-7                             
##  [78] caTools_1.17.1.2                        
##  [79] purrr_0.3.2                             
##  [80] ggraph_1.0.2                            
##  [81] R.oo_1.22.0                             
##  [82] DO.db_2.9                               
##  [83] xml2_1.2.0                              
##  [84] biomaRt_2.39.2                          
##  [85] compiler_3.6.0                          
##  [86] rstudioapi_0.10                         
##  [87] png_0.1-7                               
##  [88] tibble_2.1.1                            
##  [89] tweenr_1.0.1                            
##  [90] stringi_1.4.3                           
##  [91] GenomicFeatures_1.35.9                  
##  [92] lattice_0.20-38                         
##  [93] Matrix_1.2-16                           
##  [94] pillar_1.3.1                            
##  [95] BiocManager_1.30.4                      
##  [96] triebeard_0.3.0                         
##  [97] GlobalOptions_0.1.0                     
##  [98] data.table_1.12.0                       
##  [99] cowplot_0.9.4                           
## [100] bitops_1.0-6                            
## [101] TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.4
## [102] qvalue_2.15.0                           
## [103] R6_2.4.0                                
## [104] latticeExtra_0.6-28                     
## [105] hwriter_1.3.2                           
## [106] bookdown_0.9                            
## [107] ShortRead_1.41.0                        
## [108] KernSmooth_2.23-15                      
## [109] gridExtra_2.3                           
## [110] boot_1.3-21                             
## [111] MASS_7.3-51.2                           
## [112] gtools_3.8.1                            
## [113] assertthat_0.2.1                        
## [114] rjson_0.2.20                            
## [115] withr_2.1.2                             
## [116] GenomicAlignments_1.19.1                
## [117] Rsamtools_1.99.4                        
## [118] GenomeInfoDbData_1.2.0                  
## [119] hms_0.4.2                               
## [120] rGREAT_1.15.1                           
## [121] tidyr_0.8.3                             
## [122] rvcheck_0.1.3                           
## [123] ggforce_0.2.1